home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
IRIX 6.5 Applications 1999 May
/
SGI IRIX 6.5 Applications 1999 May.iso
/
dist
/
insight.idb
/
usr
/
share
/
Insight
/
xhelp
/
SGML.faq.z
/
SGML.faq
Wrap
Text File
|
1998-05-04
|
60KB
|
1,311 lines
===================================================
COMP.TEXT.SGML - FREQUENTLY ASKED QUESTIONS(c) v1.1
===================================================
UPDATED SECTIONS
================
07) Added/updated: Agfa SDMS, DocuBuild SGML, MacroTag, SGML/CALS
Translator, SGML Hammer, (Shafftstall) SGML Translator, Silversmith,
TABLETAG, TagWorX.
09) Added section on national/international Standards bodies.
INTRODUCTION
============
This article contains answers to questions that are frequently posted
to comp.text.sgml, it is intended for newcomers to the list and
SGML beginners.
This FAQ is maintained on a voluntary basis, but any comments or
additional information are welcome (see final section). It is a large
file, and has to be split into two parts to guarantee safe transmission
over the network (Part 1 contains sections 01-09, Part 2 contains
sections 10-end).
CONTENTS
========
01) What does "SGML" stand for? What is SGML?
02) What can SGML be used for?
03) What do I need to use SGML?
04) Books, Bibliographies, Newsletters and Journals
05) Newsgroups and discussion lists
06) Public Domain software
07) Commerical software
08) ftp archives
09) The SGML Users' Group, National Chapters, SIGs, Standards bodies
10) Conferences
11) SGML initiatives and major projects
12) SGML and other Standards
13) Introductory questions with answers - by Erik Naggum
14) Making comments/additions to this FAQ
If any of the terms used in this FAQ are unfamiliar to you, consult
the section on "Introductory questions with answers", the text of
the SGML standard (see 01.1), or a good book on SGML (see below).
01) What does "SGML" stand for? What is SGML?
==============================================
01.1 SGML stands for the Standard Generalized Markup Language. SGML is
defined in ISO 8879:1986 "Information Processing -- Text and Office
Systems -- Standard Generalized Markup Language (SGML)". A copy of
this document can be obtained from the International Organization for
Standardization (ISO) or your national standard body. It is not
available for ftp.
01.2 SGML enables the description of structured information _independent_
of how that information is processed. It is a meta-language that provides
a standard syntax for defining descriptions of classes of structured
information; these descriptions are called document type definitions
(DTDs). Information can be "marked up" according to a DTD, so that its
structure is made explicit and accessible. The "markup" can be checked
against a DTD to ensure that it is valid, and thus that the structure of
the information conforms to that of the class described by the DTD.
Ensuring that information is structured in a known way greatly facilitates
any subsequent use of that information. For more information, beginners
should read Erik Naggum's "Introductory questions with answers"
(in this FAQ), consult ISO 8879 and/or a book on SGML (also in this FAQ).
01.3 DTDs define the rules to structure information but do _not_ say how that
information should be processed. Therefore, SGML and DTDs do not deal
with how, say, a document should be processed for formatting on paper
(via LaTeX), display on-line (via Hypercard) , and mapping into a document
database (via Oracle) -- but, having made the structure of the document
explicit, enables all these subsequent processes to use exactly
the same source document. SGML is _not_ a replacement for defacto
standards such as TeX or PostScript.
01.4 SGML is non-proprietary. Publication and amendments to the International
Standard are controlled solely by the ISO.
01.5 Use of SGML is not confined to any particular make or type of
computer or software. SGML-aware products are available for most types
of machine.
01.6 It is not user-unfriendly. What SGML is, what it looks like "in the
raw", and how other software is able to make use of SGML markup, will be
of little concern to most users. Sophisticated packages now exist to
create, edit, and manipulate information that has been marked up with SGML,
e.g. quasi-WYSIWYG editors for creating SGML documents that conform to
any given valid DTD.
02) What can SGML be used for?
==============================
02.1 Its uses are many and diverse. SGML DTDs (see 01.2) define the markup
and markup rules that can be used for a given class of documents (where
"document" is a file of information). A DTD is usually written with
some kind of end processing in mind - but since SGML markup is application
independent, it means that documents that conform to a particular DTD
can be re-used in a variety of different ways.
02.2 For example, a document designer might write a DTD that enables the
abstract of a scientific paper to be marked up as such. The primary
purpose is that the text identified as forming part of a paper's
abstract can then be formatted in a particular way when the SGML source
document is translated into a file usable by a text processing system.
If, at some later date, it is decided that abstracts should be formatted
in a different way, it is only necessary to alter the translation program
and not every instance of an abstract in every paper that has to be
(re-)printed out. Moreover, knowing that in every document conforming
to the particular DTD, abstracts will be identified as such, it is a
trivial matter to combine papers supplied by several authors into a
collection that has a uniform physical appearance, or to produce a
catalogue of abstracts for publication or inclusion in a database.
02.3 If you want to know whether SGML is appropriate for a particular task,
consult the current discussion lists, journals, and Special Interest
Groups (SIGs), and/or post to a newsgroup such as comp.text.sgml. People
are always willing and eager to hear about new ways that SGML might be
used.
03) What do I need to use SGML?
===============================
03.1 In order to use SGML you will need an SGML parser (that conforms to
ISO 8879), an entity manager, an editor to produce your DTDs and/or
SGML documents, and probably some sort of translation program to convert
your SGML documents into a form suitable for some specific processing.
If you are planning to convert existing information into SGML documents,
you will need some sort of "retro-tagging" or auto-conversion software.
03.2 ISO 8879 contains very precise definitions of terms such as
"SGML system", "SGML application", "SGML parser", "entity manager" and
so on. Users are advised to consult the text of ISO 8879 carefully, as
mis-use of terms defined in the Standard can lead to misunderstandings.
03.3 The following definitions are taken from ISO 8879, but readers are
advised to consult the full text:
4.279 SGML application: Rules that apply SGML to a text
processing application. An SGML application includes a formal
specification of the markup constructs used in the application,
expressed in SGML. It can also include a non-SGML definition of
semantics, application conventions, and/or processing.
4.287 SGML system: A system that includes an SGML parser, an
entity manager, and both or either of:
a) an implementation of one or more SGML applications; and/or
b) facilities for a user to implement SGML applications, with
access to the SGML parser and entity manager.
4.285 SGML parser: A program (or portion of a program or a
combination of programs) that recognizes markup in SGML
documents.
4.123 entity manager: A program (or portion of a program or a
combination of programs), such as a file system or symbol table,
that can maintain and provide access to multiple entities.
4.120 entity: A collection of characters that can be referenced as
a unit.
NB. In note (b) of the "Scope" section of ISO 8879:1986, it states
that the Standard does NOT "Specify the implementation, architecture,
or markup error handling of conforming systems". In the glossary
to his book (see below) Eric Van Herwijnen defines "SGML
implementation: A collection of SGML application procedures that....
provide the mapping from the structure defined by a given SGML
application to a concrete system such as a textformatter or a
database."
Note: Beginners may not find any of these definitions enlightening. Be
aware that some posters use the terminology of ISO 8879 very rigorously,
whilst others are more lax. This opens the door for misunderstandings.
Unless you are sure that you are using terminology in the correct way, as
taken from ISO 8879, please try to be as explicit and unambiguous as possible.
03.4) Put simply, most users will obtain or write a DTD (see above). If
you write your own DTD, you will need to validate it using an SGML parser
that conforms to ISO 8879. To create SGML documents which conform to
a DTD you will need an editor and a parser. The editor is used to input
information and insert SGML markup into the document; the parser is used
to check that the markup and the way it has been used conform to the
rules given in the DTD. Many commercial packages offer syntax-directed
editors, which interactively ensure that any editing and markup operations
conform to the rules of the DTD.
03.5) Once you have a valid SGML document that conforms to a valid SGML
DTD, you may want to do some subsequent processing. For example, in
order to get paper output, you will need a program (or set of programs),
that can read your SGML document and produce a file acceptable to your
word processing package/text formatter. With a well-known publicly
available DTD, it may be possible to obtain a translation package that
has already been written; otherwise, you will need to write any translations
required for subsequent processing yourself.
03.6) Translating existing information into valid SGML documents can be
more problematic. SGML is good at handling structured information. You
will need to obtain/write a DTD which is suitable for representing the
structure (in full or in part) of your existing information. You will then
need to obtain/write translations which can take your existing information
and output it in an appropriate form that includes SGML markup which can be
validated against your chosen DTD. If your existing information already
contains an unambiguous structure which is clearly indicated, it should
be possible to convert this information into conforming SGML. If your
existing information is not clearly structured, or that structure is
ambiguous, conversion to SGML is much more hard work. Complex information
structures will also involve much more effort to translate into conforming
SGML.
03.7) Always look out for existing DTDs/translation packages which may
meet your needs. If you write a DTD or means of translating to/from
SGML, consider sharing it with the rest of the SGML user community
(post it to a newsgroup or ftp site). This is a good way for all of us
to have access to well-written, tried and tested DTDs etc.
04) Books, Bibliographies, Newsletters and Journals
===================================================
This list does not pretend to be complete, nor does it offer any value
judgements about any of the products listed. Items in each category
are given in alphabetical order by author/title.
Some items are available at discounted rates to members of the SGML Users'
Group or the Graphics Communication Association (GCA) - see below.
Note: Robin Cover's on-line bibliography contains many more references,
and much more information on each text.
04.1 Books:
BRYAN, Martin "SGML: an author's guide to the standard generalized markup
language". Wokingham/Reading/New York: Addison-Wesley. 1988. 380 pages.
ISBN: 0-201-17535-5 (pbk).
GOLDFARB, Charles "The SGML Handbook". Oxford: Oxford University Press. 1990.
688 pages. ISBN: 0-19-853737-9 (hbk).
HERWIJNEN, Eric van "Practical SGML". Dordrecht/Boston/London: Kluwer
Academic Publishers. 1990. 307 pages. ISBN 0-7923-0635-X (pbk).
SMITH, Joan & STUTELY, Robert "SGML:the users' guide to ISO 8879". New York/
Chichester/Brisbane/Toronto: Ellis Horwood Limited/Halstead Press. 1988.
173 pages. ISBN 0-7458-0221-4 (Ellis Horwood Limited) (hbk).
ISBN 0-470-21126-1 (Halstead Press)(hbk).
SOFTQUAD Inc. "The SGML Primer". Toronto: SoftQuad Inc. Private printing,
available from SoftQuad Inc.
04.2 Bibliographies:
COVER, Robin & DUNCAN, Nicholas & BARNARD, David "BIBLIOGRAPHY ON SGML
(Standard Generalized Markup Language) AND RELATED ISSUES Technical
Report 91-299). Ontario: Queen's University at Kingston. 1991. 312 pages.
ISSN 0836-0227. 1991 Cost $21.00 (Canadian). Contact: Doug Hamilton,
Dept. of Computing & Information Science. Goodwin Hall, Queen's University,
Kingston, Ontario, CANADA K7L 3N6. Phone: (1-613) 545-6056. Email
(Internet): hamilton@qucis.queensu.ca
COVER, Robin "STANDARD GENERALIZED MARKUP LANGUAGE, ISO 8879:1986 (SGML)
ANNOTATED BIBLIOGRAPHY AND LIST OF RESOURCES" version 2.0 Revised
January 1992. (c) Robin Cover. Available on-line from many ftp sites.
Updates posted to comp.text.sgml etc. Contact: Robin Cover 6634 Sarah
Drive, Dallas, TX 75236 USA. Phone: (214) 296-1783. Fax: (214) 709-3387.
Email (Internet): robin@utafll.uta.edu.
04.3 Newsletters & Journals:
Note: There are several journals dedicated to CALS - see Robin Cover's
on-line Bibliography or contact the CALS SIG of the International SGML
Users' Group for details.
"EPSIG News" - A quarterly publication of information relating to the
ANSI/NISO manuscript standard Z39.59-1988 (also known as the "AAP"
standard). Avaiable through EPSIG (address below). ISSN 1042-3737.
"SGML Users' Group Newsletter" - An occasional publication of news, events,
product announcements and short articles available through the
International SGML Users' Group (address below). ISSN 0952-8008
"SGML Users' Group Bulletin" - Longer/more technical papers than appear
in the SGML Users' Group Newsletter. Available through the International
SGML Users' Group (address below). ISSN 0269-2538.
"SGML SIGhyper Newsletter" - An occasional publication of the SGML Users'
Group Special Interest Group on Hypertext and Multimedia (SIGhyper).
Available through SGML SIGhyper (address below).
"<TAG> The SGML Newsletter" - Managing Editor: Brian Travis (Internet email:
brian@sgmlinc.com). 12 issues per year. Contact: Graphic
Communications Association. Phone: +1 703-519-8157. Fax: +1 703-548-2867.
05) Newsgroups and discussion lists
===================================
05.1 Newsgroups
comp.text.sgml - this usenet newsgroup. The main electronic forum for
discussion of SGML and closely related matters. Begun in late 1990.
All postings are archived at the ftp site maintained at Oslo University
in Norway (searching via WAIS and gopher is possible).
sgml@ifi.uio.no - electronic mailing list that echoes all postings to
comp.text.sgml for those who have difficulties with usenet news. To
subscribe:
mail: sgml@ifi.uio.no
subject: subscribe comp.text.sgml
body: (blank)
05.2 Discussion lists
sgml-l - electronic mailing list for discussion of SGML issues. Many
articles posted to comp.text.sgml are echoed to this list. To subscribe:
mail: listserv@dhdurz1
subject: (blank)
body: SUB SGML-L Michael Popham <NB. use your full name>
SIGNUP
sgml-math - electronic mailing list for discussion of issues relating to the
handling of math under the AAP Standard. DTD fragments are circulated for
comment. To subscribe:
mail: listerv@e-math.ams.com
subject: (blank)
body: subscribe sgml-math Michael Popham <NB. use your full name>
set sgml-math mail ack
help
sgml-tables - electronic mailing list for discussion of issues relating to the
handling of tables under the AAP Standard. DTD fragments are circulated for
comment. To subscribe:
mail: listerv@e-math.ams.com
subject: (blank)
body: subscribe sgml-tables Michael Popham <NB. use your full name>
set sgml-tables mail ack
help
tei-l - electronic mailing list for discussion/information relating to
the work of the Text Encoding Initative (TEI). To subscribe:
mail: listserv@uicvm
subject: (blank)
body: SUB SGML-L Michael Popham <NB. use your full name>
SIGNUP
06) Public Domain software
==========================
Note: Public domain products are available from most of the anonymous
ftp archives. (The full addresses of many of the ftp archives is given
in Robin Cover's on-line Bibliography, or you could search available
archives using ARCHIE). Older public domain products are also available
from some ftp sites, but are not listed here.
ARC-SGML
A set of SGML Parser Materials, produced by Dr Charles Goldfarb
and made available through the SGML Users' Group. Contains source code
which can be used to build your own programs to handle SGML; also
contains a sample application called vm2. Copies on disk are
available through the GCA, SGML SIGhyper, and The SGML Project at the
University of Exeter. The orginal source code was written in C to
run on IBM compatible PCs under DOS. The original files and ports to many
operating systems and platforms (e.g UNIX, Mac) are available for ftp.
(When searching ftp archives, look for directories/files with names like
"arcsgml" or "ARC-SGML").
ICA (Integrated Chameleon Architecture)
A code generating software architecture for producing translators between
different representations of electronic data. ICA is not SGML-specific.
Runs under UNIX, using X Windows (R4, R5). The ICA Project is based at
Ohio State University, and all new releases come from there. Available
for ftp from archive.cis.ohio-state.edu, under the directory pub/chameleon.
(The accompanying PostScript file of documentation runs to 186 pages).
Contact: Peter Ware <ware@edu.ohio-state.cis>
qwertz/FORMAT
An SGML to LaTeX and nroff/troff translator produced by
the Qwertz Project at the German National Centre for Computer Science.
The LaTeX document styles have been re-written as an SGML DTD (the
qwertz DTD). SGML documents can be created, and quickly mapped into
a format suitable for processing by a LaTeX, nroff/troff formatter. New
releases are announced on comp.text.sgml. Available for ftp. The original
code is available for ftp from gmdzi.gmd.de [129.26.1.90] under the
directory /pub/gmd (get "sgml2latex-format.readme" and "sgml2latex-format.
tar.Z")
sgmls
An SGML parser derived from the ARC-SGML Parser Materials, written
by James Clark. sgmls outputs a simple, line-oriented, ASCII representation
of an SGML document's Element Structure Information set which can be
easily parsed by awk, perl, C or whatever. The idea is that sgmls can be
used as the front end for a structure-controlled SGML application. New
releases are announced on comp.text.sgml. sgmls consists of C source
code intended to run under UNIX, but with instructions for porting/compiling
under DOS. Available for ftp. (look for directories/files with names
like "sgmls", "jclark", "sgmls-0.8.tar.Z").
07) Commerical software
=======================
This list is not complete. Omission from this list is through accident
or ignorance. A list of products is given, followed by a list of contact
names and addresses.
Note: NO VALUE OR "FITNESS FOR PURPOSE" JUDGEMENT IS PLACED ON ANY
PRODUCT OR SERVICE LISTED. ALWAYS CHECK WITH THE SUPPLIER TO ENSURE
THAT ANY PRODUCT (OR COMBINATION OF PRODUCTS) WILL DO WHAT YOU WANT,
AND WILL WORK WITH YOUR COMPUTER/OPERATING SYSTEM.
Note: "quasi-WYSIWYG" refers to the capablity of some packages to format
the screen/paper output of an SGML document, such that the two output
representations are similar.
07.1) Product list
Agfa CAPS - for large-scale publishing. Contact: local Agfa Gevaert office.
Agfa SDMS - SGML-based document management system. Contact: local Agfa
Gevaert office.
Author/Editor - create/edit/validate SGML DTDs and documents (quasi-WYSIWYG).
Contact: SoftQuad Inc.
BASISplus - SGML-aware database system. Contact: Information Dimensions Inc.
DocuBuild SGML - for large-scale (CALS) publishing on DEC VAX.
Contact: Xerox Corporation.
DynaText - on-line SGML document indexing/searching/browsing. Contact:
Electronic Book Technologies
EASE - create/edit/validate SGML DTDs and documents. Contact: E2S
FastTAG - conversion/auto-tagging for scanned hardcopy and electronic
text files. Contact: Avalanche Development Company
FrameMaker - DTP with some (CALS?) SGML capabilities. Contact: Frame
Technology Corporation
Grif - create/edit/validate SGML DTDs and documents (quasi-WYSISWYG), inc.
graphics. Contact: Grif S.A.
Guide - Guide hypertext from SGML documents. Contact: Office Workstations
Limited (OWL)
HyMinder - HyTime engine (in production). Contact: TechnoTeacher, Inc.
Interleaf - DTP with some (CALS?) SGML capabilities. Contact: Interleaf
MacroTag - supports macros for using SGML with MS-Word 4.0 or
WordPerfect 5.0. Contact: Allen Creek Software.
Mark-It - validates SGML DTDs and documents. Contact: Sema Group Systems
Limited
SGML/CALS Translator - conversion/auto-tagging. Contact: Shafftstall
Corporation.
SGML-DB - an SGML-aware database system. Contact: A.I.S/Berger Levrault
SGML Hammer - conversion/auto-tagging for SGML documents.
Contact: Avalanche Development Company
SGML Publisher - create/edit/validate SGML DTDs and documents (quasi-
WYSIWYG), inc. graphics. Contact: AborText Inc.
SGML/Search - an SGML-aware database system (being replaced by SGML-DB)
Contact: A.I.S/Berger Levrault
SGML Translator - validates SGML documents, translates SGML documents
to DCF also BookMaster, BookManager. Contact: IBM
SGML Translator - conversion/auto-tagging. Contact: Shafftstall Corporation.
SGML Toolchest - a set of tools to aid the production of SGML documents on
DEC/VAX machines. Contact: DEC
Silversmith - full-text retrieval system for SGML documents. Contact:
Taunton Engineering.
TABLETAG - conversion/auto-tagging for Lotus 1-2-3 spreadsheets to
CALS, AAP and Author/Editor tables. Contact: The Unifilt Company.
TagWorX - conversion/auto-tagging of scanned documents (within ScanWorX).
Contact: Xerox Imaging Systems.
TextWrite - create/edit SGML documents. Contact: IBM
TextWrite Tools - create/edit SGML DTDS. Contact: IBM
WordPerfect Markup - converts between (UNIX) WordPerfect 5.1 and SGML
documents. (In testing until late 1992; release due early 1993?).
Contact: WordPerfect
Write-It - create/edit SGML documents. Contact: Sema Group Systems Limited
WriterStation - create/edit SGML documents. Contact: Datalogics Inc.
XGML Omnimark - conversion/auto-tagging. Contact: Exoterica Corporation
XGML Validator - validates SGML DTDs and documents. Contact: Exoterica
Corporation
07.2 Contact list
A.I.S/Berger Levrault - 34 Av. du Roule, 92200 Neuilly, France;
Phone: +33-1-46-40-10-60: Fax: +33-1-46-40-18-44.
AborText Inc.- 533 West William Street, Suite 300, Ann Arbor, MI 48103,
USA; Phone: +1-313-996-3566; Fax: +1-313-996-3573;
Email: sales@arbortext.com (Internet)
Agfa Gevaert - (Your local Agfa Gevaert office/supplier).
Allen Creek Software - Carol Kamm, 1209 West Huron, Ann Arbor, MI 48103,
USA; Phone: +1-313-663-4248.
Avalanche Development Company - Eileen Quirk, Director of Marketing and Sales,
Avalanche Development Company, 947 Walnut Street, Boulder, CO 80302,
USA; Phone:+1-303-449-5032; Fax: +1-303-449-3246;
Email: sales@avalanche.com (Internet)
Datalogics Inc. - 441 West Huron Street Chicago, Illinois 60610, USA;
Phone: +1-312-266-4444.
DEC - (Your local Digital Equipment Corporation (DEC) supplier)
E2S - Ronny Verkest, Sales Manager, E2S, Moutstraat 100, B-9000 Gent,
Belgium; Phone: +32(91)-21-03-83; Fax: +32(91)-20-31-91;
Email: e2s@e2s.be (Internet)
Electronic Book Technologies - One Richmond Square, Providence,
RI 02906, USA; Phone: +1-401-421-9550; Fax: +1-401-421-9551
Exoterica Corporation - 1545 Carling Ave., Suite 404, Ottawa, Ontario,
Canada K1Z 8P9; Tel: 613-722-1700 (or 1-800-565-XGML
for product information); Fax: 613-722-5706.
Email: info@xgml.com (enquiries) (Internet)
Frame Technology Corporation - 2911 Zanker Road, San Jose, CA 95134,
USA; Phone: +1-408-433-3311
Grif S.A. - 2 Bd Vauban BP 266, 78053 St-Quentin-en-Yvelines, Cedex,
France; Phone: +33-1-30-60-75-10; Fax: +33-1-30-60-75-27
IBM - (Your local IBM office)
Information Dimensions Inc. - 5080 Tuttle Crossing Boulevard, Dublin,
Ohio 43017-3569, USA; Phone: 1-800-DATA-MGT: Fax: 614-761-7290
Interleaf - (Your nearest Interleaf supplier)
Office Workstations Limited (OWL) - Rosebank House, 144 Broughton Road,
Edinburgh EH7 4LE, UK; Phone: +44-31-557-5720; Fax: +44-31-557-5721.
Sema Group Systems Limited - Martin Bryan, Sema Group Systems Limited,
Avonbridge House, Bath Road, Chippenham, Wiltshire SN15 2BB, UK;
Phone: +44-249-656194; Fax: +44-249-655723
Shafftstall Corporation - Anthony L. Shaffstall, VP Sales, 7901 East 88th
Street, Indianapolis, IN 46256-1235, USA; Phone: +1-317-842-2077
SoftQuad Inc. - 56 Aberfoyle Crescent, Suite 810, Toronto, CANADA, M8X 2W4;
Phone: +1-416-239-4801; Fax: +1-416-239-7105; Email: mail@sq.com
(Internet)
Tauton Engineering - John Bottoms, Tauton Engineering Inc., 26 Westvale Road,
Condord, MA 01742-2935, USA.
TechnoTeacher, Inc.- Steve Newcomb, TechnoTeacher, Inc., 1810 High Road,
Tallahassee, FL 32303-4408, USA; Phone: +1-904-422-3574;
Fax: +1-904-386-2562.
The Unifilt Company - Michael Kless (President), PO Box 2528, Edison,
NJ 08817, USA. Phone: +1-908-225-2243; Fax: +1-908-225-2248.
WordPerfect - (Your local/national WordPerfect Corporation office)
Xerox Corporation - Publishing Marketing Manager, 10200 Willow Creek
Road, San Diego, CA 92131, USA. Phone: +1-619-695-7789;
Fax: +1-619-695-7710.
Xerox Imaging Systems - 9 Centennial Drive, Peabody, MA 01960, USA.
Phone: +1-508-977-2000; Fax: +1-508-977-2148.
(European office): Unit 8, Suttons Business Park, Reading,
RG6 1AZ, UK. Phone:+44-734-668421; Fax: +44-734-261913.
08) ftp archives
================
Many ftp archives now hold information on SGML and copies of public
domain software, DTDs etc. (see Robin Cover's on-line Bibliography, or
use ARCHIE to find your nearest ftp site). The main archives are:
ftp.ifi.uio.no [129.240.88.1] - University of Oslo, Norway
mailer.cc.fsu.edu [128.186.6.103] - Florida State University, USA
sgml1.ex.ac.uk [144.173.6.61] - Exeter University, UK
The ftp archive at Oslo is very well maintained. Some national archives
mirror the contents of one or more of those mentioned above.
Please try to be considerate when using ftp - it is a privilege not a right.
The setting up and maintenance of most archives is done on a voluntary
basis, using resources that are loaned by the site administrators.
09) The SGML Users' Group, National Chapters, SIGs
==================================================
The International SGML Users' Group was set up to promote the use of
SGML and represent the interests of SGML users on various international
bodies.
Membership of the International SGML Users' Group (or an affiliated
National Chapter or SIG), entitles you to receive the Users' Group Newsletter
and Bulletin, and discounts on various books and conferences.
For membership details, contact:
Mr Stephen G Downie
SoftQuad Inc.
56 Aberfoyle Crescent
Suite 810
Toronto, Ontario M8X 2W4
Canada
Phone: +1-416-239-4801
Fax: +1-416-239-7105
Activities and costs of joining a National Chapter or SIG varies greatly.
Please contact the appropriate person (see below) for more details. Other
CALS SIGs may exist, but I do not have any information about them; I will
list them if and when details are supplied.
09.1 National Chapters are listed below by country (with contact names):
Australia - (Just setting up. Contact: Nick Carr, PO Box R806, Sydney NSW,
Australia 2000; Phone: 612-262-4777; Fax 612-262-4774)
Canada - Dr Martin Levy (Chairman), Senior Director Regulatory Affairs,
Vice President Scientific Affairs, Fujisawa Pharmaceutical
Company, 7181 Woodbine Avenue, Suite 110, Markham, Ontario L3R 1A3
Canada; Phone: +1-416-470-7990; Fax: +1-416-470-7799.
France - (Just setting up. Watch this space!)
Germany - Dr Manfred Kruger, MID/Information Logistics Group GmbH,
Ringstrasse 15, 6900 Heidelberg, West Germany; Phone: +49-6221-
166-091; Fax: +49-6221-23921.
Japan - Mr Makoto Yoshioka, Research Fellow, Personal Systems Division,
Fujistu Laboratories Ltd, 1015, Kamikodanaka Nakahara-Ku,
Kawasaki 211, Japan; Phone: +81-44-754-2690; Fax:+81-44-754-2594.
Netherlands - Mr Jan Maasdam, Samsom Uitgeverij, Postbus 4, 2400 MA
Alphen aan de Rijn, The Netherlands; Phone: +31-1720-66-612.
New Zealand - (See Australia)
Norway - Mr Jon Urdal, Fabritius A/S, Brobekkvn. 80, 0583 Oslo 5, Norway.
Phone: +47-2-636400; Fax: +47-2-636590; Email: ju@gi.no (Internet)
South East Asia - (See Australia)
Switzerland - Mr Jurgen De Jonghe, AS Division, CERN, Geneva 23,
Switzerland; Phone: +41-22-767-81-41; Fax: +41-22-782-47-20.
UK - Mr Nigel Bray, Database Publishing Systems Ltd., 608 Delta Business
Park, Great Western Way, Swindon, Wiltshire SN5 7XF, UK;
Phone:+44-793-512-515; Fax: +44-793-512-516.
USA (Colorado) - (Just setting up. Contact Brian Travis, Editor of <TAG>
who may have more information)
USA (New York) - Mr W Joseph Davidson, SGML Forum of New York, Bowling
Green Station, P.O. Box 803, New York, NY 10274-0803, USA;
Phone: +1-212-691-4463; Fax: +1-212-691-1821.
USA (Washington/Midwest) - Ms Beth Micksch, Datalogics Inc., 441 West Huron
Chicago, IL 60611, USA; Phone: +1-312-266-3131; Fax: +1-312
-266-4473; Email: bem@dlogics.com (Internet)
09.2 Special Interest Groups (SIGs):
ATA SIG - Ms Dianne Kennedy, Datalogics Inc., 441 West Huron, Chicago,
IL 60611, USA; Phone: +1-312-266-4483; Fax: +1-312-266-4473;
Email: dkv@dlogics.com (Internet)
CALS in Europe SIG - David Ardron, Secretary, CALS in Europe SIG, Ferranti
Computer Systems Ltd., Western Road, Bracknell, Berkshire RG12 1RA,
UK; Phone: +44-344-483232; Fax: +44-344-54639
Database SIG - Mr Hans Mabelis, c/o Matrices Software, Westeinde 14,
1017 ZP Amsterdam, The Netherlands; Phone: +31-20-25-50-06;
Fax: +31-20-24-79-48.
European Workgroup on SGML (EWS) - Mr Holger Wendt, Springer-Verlag GmbH & Co.
KG, Postfach 105280, Tiergartenstrasse 17 6900, Heidelberg 1,
Germany; Phone: +49-6221-487-324; Fax: +49-6221-43982.
SGML SIGhyper (The SGML Users' Group SIG on Hypertext and Multimedia) -
Mr Steven R Newcomb, TechnoTeacher, Inc., 1810 High Road,
Tallahassee, FL 32303-4408, USA; Phone: +1-904-422-3574;
Fax: +1-904-386-2562.
09.2 Standards bodies:
American National Standards Institute (ANSI) - 1430 Broadway, New York,
NY 10018, USA. Phone: +1-212-642-4995.
British Standards Institution (BSI) - Linford Wood, Milton Keynes, MK14, UK.
International Organization for Standardization (ISO) - 1 Rue de Varembe,
Case Postale 56, CH-1211 Geneva 20, Switzerland.
10) Conferences
===============
The major SGML-related conferences are organized by the Graphics
Communications Association (GCA), at reduced rates for GCA members. Anyone
can attend, and for SGML`92 the GCA have introduced a discount rate for
representatives from academic institutions. To contact the GCA:
Graphic Communications Association
100 Daingerfield Road, 4th Fl.
Alexandria
VA 22314-2888
Phone: (703)-519-8157
Fax: (703)-548-2867
"International Markup" - Held annually since 1982 in Europe. General/all
SGML (and closely related issues). Next: 9-12 May 1993, Rotterdam.
"SGML" - Held annually in USA. General/all/specialist SGML (and closely
related issues). Generally more technical that "International Markup".
Next: 25-30 October 1992, Danvers (MA).
11) SGML Initiatives and major projects
=======================================
This section contains details on:
AAP / EPSIG
EWS
TEI
11.1) AAP / EPSIG - Association of American Publishers, Electronic
Publishing Special Interest Group. Responsible for producing, maintaining
and updating ANSI/NISO Z39.59-1988 (also know as the "AAP Standard"). The
AAP standard consists of a set of DTDs relating to the preparation and
markup of electronic manuscripts. The standard is currently undergoing
review, and specialist workgroups of interested volunteers are working on
DTDs for handling tables and mathematics (using two electronic discussion
lists - see the appropriate section, above). For information, contact:
Betsy Kiser (EPSIG Manager)
EPSIG
c/o OCLC
6565 Frantz Road
Dublin
Ohio 43017-0702
USA
Phone: (614)-764-6195
Fax: (614)-764-6096
11.2) EWS - European Workgroup on SGML. A collection of major European
publishers, typesetters, printers (and other interested parties), working
toward producing a DTD (or set of DTDs), suitable for publishing scientific
journals, papers etc. Initally based-on, and still closely watching, work
on the AAP Standard. EWS DTD(s) currently known as the "MAJOUR DTD" - a
complete version of which is due out at the end of 1992 (a copy of the
header part of the DTD and accompanying handbook was distributed at
"International Markup `91"). For information, contact:
(See entry for EWS above - in section dealing with
Special Interest Groups).
11.3) TEI - The Text Encoding Initiative. An international research project
to develop and disseminate guidelines for the encoding and interchange of
machine-readable texts. Primarily concerned with taking existing texts,
and marking them up with SGML (so as to facilitate later study). The TEI
has several specialist committees and workgroups e.g. General Linguistics,
Spoken Texts, Historical Studies, Machine Readable Dictionaries, Computational
Lexica, Terminological Databases, Character Sets, Text Criticism, Hypertext
and Hypermedia, Mathematical Formulae and Tables, Language Corpora, Verse,
Performance Texts, Literary Prose. The TEI maintains two electronic
discussion lists (tei-l and sgml-l), two archives of related documentation
(reports, DTDs, and entity sets), and publishes widely. For more information,
contact one of the co-ordinators:
C. Michael Sperberg-McQueen
University of Illinois at Chicago
Computer Center (M/C 135)
Box 6998
Chicago IL 60680
USA
Phone: +1-312-996-2981
Fax: +1-312-996-6834
Email: U35395@uicvm.cc.uic.edu (Internet)
U35395@UICVM (Bitnet)
Lou Burnard
Oxford University Computing Service
13 Banbury Road
Oxford OX2 6NN
UK
Phone: +44-865-273-238
Fax: +44-865-273-275
Email: LOU@VAX.OX.AC.UK (Internet)
12) SGML and other Standards
============================
Some or all of the Standards mentioned below are described in much more
depth in Robin Cover's on-line Bibliography (users are advised to consult
this). Copies of International Standards are available through your national
standards body. Standards are listed below by acronym/title.
ASCII - (see Character Sets)
Character Sets -
Being non-proprietary and device-independent, SGML does not restrict users
to a particular character set. This is a complex area of SGML, and readers
are directed to ISO 8879, and the ftp archive of postings to comp.text.sgml
on this subject (at Oslo) for futher information.
DSSSL - Document Style Semantics and Specification Language (ISO/IEC DIS
10179:1990).
The introduction to the Standard says DSSSL can be used "..for the
specification of document processing such as formatting and data management
functions, with the initial focus on formatting to both print and on display
media, and data conversion......The objective of the DSSSL Standard is to
provide a formal and rigorous means of expressing the range of document
production specifications, including high-quality typography, required
by the graphics arts industry." DSSSL is not yet a full International
Standard.
EBCDIC - (see Character Sets)
Graphics - (see Proprietary/defacto standards)
HyTime - Hypermedia/Time-based Structuring Language (ISO/IEC 10744)
(Extract from Robin Cover's Bibliography:) "HyTime is a standard neutral
markup language for representing hypertext, multimedia, hypermedia and
time- and space-based documents in terms of their logical structure. Its
purpose is to make hyperdocuments interoperable and maintainable over the
long term. HyTime can be used to represent documents containing any
combination of digital notations. HyTime is parsable as Standard Generalized
Markup Language..." HyTime was accepted as a full International Standard
in spring 1992.
ODA - Open (or "Office") Document Architecture (ISO 8613)
This is mentioned because ODA is often presented as if both it and SGML
existed in opposition to one another. This is not the case. ODA is a
complex standard, and is undergoing a thorough review; contact your national
standards body for more information. Papers have been published that
compare and contrast ODA and SGML (see Robin Cover's on-line Bibliography
for references). Note: some readers object to postings on ODA being
sent to comp.text.sgml (try comp.text instead).
Proprietary/defacto standards.
There is a common misconception that SGML exists in competition to some
major existing defacto and proprietary standards (such as PostScript
or TeX/LaTeX). This is not the case, and as you find out more about SGML
this should become self-evident (however, see SPDL). Note that SGML
enables the inclusion of data marked up with something other than SGML
(e.g. TeX, Encapsulated PostScript, a Lotus spreadsheet, CCITT/4,
CGM, TIFF etc.)
SDIF - SGML Document Interchange Format (ISO 9069:1988)
"A standard describing the interchange for documents enclosed with SGML"
(Eric Van Herwijnen, "Practical SGML")
SGML-B - (SGML Binary ?)
A standard for describing a compiled form of SGML (?). James David Mason
(Convenor, ISO/IEC JTC1/SC18/WG8), posted the following message to
comp.text.sgml (15 Aug 1992) "The official status of SGML-B is that it
is an approved work item in ISO/IEC JTC1/SC18/WG8, the group responsible
for SGML itself. The editors are Dr. Charles Goldfarb, the SGML project
leader, and Dr. David Abrahamson, of Trinity College, Dublin. The project
is being maintained as officially active, with the provision that it will
not be progressed until the current review and potential revision of SGML
itself is further along. Our intention is to make SGML-B reflect whatever
revisions we decide to incorporate into the base standard and then to
make it a part of the revised standard rather than something independent.
SMDL - Standard Music Description Language (ISO/IEC CD 10743)
(Extract from Robin Cover's Bibliography:) "..SMDL 'defines a
language for the representation of music information, either alone,
or in conjunction with text, graphics, or other information needed for
publishing or business purposes.' Multimedia time sequence information
is supported. SMDL is a HyTime application...." SMDL came before, and
was the source of inspiration for HyTime. Not to be confused with
SDML ("Standard Digital Markup Language"?) which is a proprietary standard.
SPDL - Standard Page Description Language (ISO/IEC DIS 10190:1991)
A Standard for mapping to (and possibly from) a description language for
output devices. Thus an SGML document might go through DSSSL- and SPDL-
conforming processes before being output on a printer. SPDL might be
seen to be competing with defacto standards such as PostScript.
13) Introductory questions with answers - by Erik Naggum
========================================================
Note: This section includes the bulk of the text posted by Erik Naggum
to comp.text.sgml as FAQ version 0.0 (Dec. 15 1992). It contains the
following sections, questions, and answers:
<SECTION>INTRODUCTORY QUESTIONS with answers.
<Q>What is SGML, briefly?
<Q>Can I read more about SGML somewhere?
<Q>SGML is often mentioned as being a "meta-language". What is that?
<Q>What does an SGML document look like?
<Q>Can I get the ARC SGML from somewhere electronically?
<Q>I've received an SGML document from a net.friend, what can I do with it?
<Q>I'm writing a book, and my publisher wants me to submit an SGML
document on a diskette, what do I do?
<SECTION>TECHNICAL QUESTIONS with answers
<Q>What, precisely, is an "element"?
<Q>What is an "entity" in SGML?
<SECTION>INTRODUCTORY QUESTIONS with answers.
<Q>What is SGML, briefly?
<A>SGML is an abbreviation for the "Standard Generalized Markup
Language". SGML is defined in an International Standard published by
the International Organization for Standardization (ISO), with
reference number ISO 8879:1986, bearing the full name "Information
processing -- Text and office systems -- Standard Generalized Markup
Language (SGML)".
To most people, _markup_ means an increase in the price of an article.
Although we talk about increases in value, it's not the same thing.
"Markup" is a term coming from the publishing and printing business,
where it means the instructions for the typesetter that were written
on a typescript or manuscript copy by an editor. Today, with your
favorite editor, you can enter the markup yourself, or even have it
entered for you, in terms of codes or other instructions for an
electronic typesetting program, which in simple cases is also the
editor. An example is troff's ".ce" for "center the following line".
A _markup_language_ is a set of means (constructs) to express how text
(i.e., that which is not markup) should be processed, or handled in
other ways. Unlike most other artificial languages, markup languages
have to deal with embedded data, and contain rules for what is markup
and what is data. For instance, in TeX the backslash means that
subsequent input is TeX instructions. Most markup languages offer
additional, administrative, language constructs, with which to define
other language constructs (such as macros).
_Generalized_markup_ is markup that has the curious property that it
does _not_ specify how things should look. We still call it markup,
though, because of the similarity with markup as described above. For
instance, "<Q>" and "<A>" are used in this FAQ to denote Question and
Answer, respectively. This doesn't say anything about how questions
should look in a typeset edition of this FAQ. You could have all the
questions rendered in bold-face, for instance. With generalized
markup, you tell the system _what_ you have, rather than how it should
look, and you do so by putting a label (tag) around the text. There
is a clear correlation between tags and what things look like. Tags
are placed at the start and at the end of text or a certain kind, and
these are precisely the places where typographic features are used,
such as spacing, change of typeface, etc. An example is LaTeX, which,
through macros, let you talk about itemized lists, instead of indents,
item numbering, among other things.
The _Standard_Generalized_Markup_Language_ started out as a large set
of common tags, but it was soon discovered that this would be far too
large and still not big enough. So rather than try to outdo Sisyfos
in pointless and eternal tasks, SGML is a language which makes it
possible to roll your own generalized markup language, but with a
standard form and in standard ways. (In practice, you won't exactly
roll your own, any more than you design LaTeX packages on your own.
Although some people actually do that!) Central to the design of SGML
is the idea that a set of generic identifiers (the names of the tags),
together with their interrelationships, form a type (or class) of
documents, and that every document is an instance of a class, which
means it can be validated with respect to this class.
<Q>Can I read more about SGML somewhere?
<A>Let me suggest only one book, and then a bibliography. The book is
Charles F. Goldfarb: The SGML Handbook; Oxford University Press, 1990;
ISBN 0-19-853737-9. This book includes the text of the standard, so
you don't have to worry about finding out how to order it from your
ISO national member body or directly from ISO in Geneva, or wherever.
The main feature of this book is that Charles Goldfarb, who is the
project editor for the standard in ISO's SGML committee, has added a
tremendous amount of annotations and has provided links between parts
of the standard to guide your yearning for knowledge. Another big win
is the overview, which takes you through a guided tour of concepts and
facilities. If there be only one authority on SGML, this book is it.
A "paper hypertext" feature makes the links in the text easy to
follow. This is a book you need.
The bibliography is Robin Cover's Brief Bibliography, also to be
published on this newsgroup, and it covers the essentials, as well as
enough pointers to other works to fill a wall of literature. Robin
Cover, et alia, produced the huge, 312-page "Bibliography on SGML"
(Tech Report 91-299, Queen's University, Kingston, Ontario, Canada),
an incredibly useful work. Robin Cover continues to track the SGML
arena, and hopefully, he will continue to provide us with the fruits
of his work.
<Q>SGML is often mentioned as being a "meta-language". What is that?
<A>This refers to the fact that SGML isn't only one language, but a
language which describes other languages within its framework. As we
talked about classes of documents and every document being an instance
of such a class, we talk about a class of markup languages, and every
markup language being an instance of the class. SGML also has the
necessary expressive power to redefine the particular characters that
are to be considered markup in a particular markup language, so that
SGML is really a meta-language with an abstract syntax that each SGML
document fills in to get a concrete syntax and a particular markup
language for that document. This is the administrative information
that makes it possible to talk about "conformance" to SGML.
<Q>What does an SGML document look like?
<A>An SGML document is divided into three different parts, each with a
clearly defined function.
The first part specifies the character set of the document, which of
these characters have special meaning to SGML in the rest of the
document, and which advanced features are used. This is called the
"SGML declaration", and is like a list of ingredients on food, so you
know what to expect and what you can't eat. Using this as a check-
list, you can determine whether your system can handle the document at
hand. The SGML declaration looks like this:
<!SGML "ISO 8879:1986"
(There are cases where this might be absent. If the document uses all
the default features, and a concrete syntax defined in the standard as
the Reference Concrete Syntax, then nothing needs to be said. This is
chiefly useful within a local system.)
All this talk about an abstract syntax can be a little overwhelming,
so we'll use the Reference Concrete Syntax unless something is said to
the contrary.
The second part of an SGML document specifies the document type and
thereby the tags that can be used in the document, among a host of
other things. Most often, SGML documents have this part external to
the document itself, so it doesn't look big. Most users won't see the
many markup declarations, as they're called, that go into a document
type, so I'll leave it to the technical part of this FAQ. Anyway,
this part consists of document type declarations, and they start thus:
<!DOCTYPE
followed by the name of the document type, and its definition. A
simple definition of this FAQ could point to a file "faq", and then
the document type declaration would like this:
<!DOCTYPE faq SYSTEM "faq">
(There can be several document types, and a another construct called
link type declarations (similar to DOCTYPE, but with LINKTYPE).)
The third part of an SGML document is the marked-up "real" document
which all of the administrative information and legwork makes
possible. This is called the document instance. It usually begins
with the name of the document in angle brackets, like this;
<faq>
which is the syntax for a start-tag of an element. The corresponding
end-tag looks like this:
</faq>
When your parser reads your document, it checks that the tags in the
document belong to the document type, and that they are allowed where
they're used, again according to the document type. This process is
called "validation". When a document is validated, it does not need
to be so again no matter what your parser is instructed to do with it,
and no matter which application will use the data in the document.
This is another strength of SGML: application-independent validation.
<Q>What do you mean "my parser"? Are there any freely available ones?
<A>99% of the fun with SGML can be had only with a parser, so you do
need one. (The remaining 1% comes from beholding the elegance and
beauty of the language, and contemplating all the wondrous things you
can do with it, once you have a parser. This feeling tends not to
last, unless you're developing a parser, in which case it's almost all
the fun.) Fortunately, a competent programmer and SGML afficionado
has had a lot of fun lately, and in mid-July 1991, the ARC SGML parser
materials were released. The ARC SGML parser materials are legally
unencumbered (i.e., you can do whatever you want with it) and it's
available for a nominal cost from the SGML Users' Group, as well as
from several public SGML repositories.
<Q>Can I get the ARC SGML from somewhere electronically?
<A>The University of Oslo, Department of Informatics, kindly sponsors
a public FTP archive with material on SGML and has the ARC SGML parser
available for anonymous FTP. Both the original MS-DOS distribution
and a Unix port done by James Clark are available. This archive also
holds information on some standards related to SGML, most notably an
SGML application for hypermedia documents (the Hypermedia/Time-based
structuring language, HyTime). Take a look around in the SGML and
SIGhyper subdirectories. (Anonymous FTP works like this: You need to
be connected to the Internet, and need a program which can talk the
FTP protocol, usually something with "FTP" in it. On Unix systems,
you can say "ftp ftp.ifi.uio.no", and that should be it. You will be
asked for a user name -- reply "anonymous". You will then be asked
for a password -- reply with your Internet mail address. You're now
logged in, and can use the "cd" command to switch directories ("cdup"
to go one level up), and "ls" to look around. Use "get" to fetch
files.) If you need guidance, or can't use FTP, you may write to
<SIGhyper-request@ifi.uio.no>, which I'll try to answer as fast as
possible. There are also other FAQs available on how to FTP.
<Q>I've received an SGML document from a net.friend, what can I do
with it?
<A>Didn't your net.friend tell you?? Seriously, an SGML document is,
as mentioned above, an instance of a document type, and a document
type can be many things, and it's only part of an application of SGML.
Such an application consists of several parts: First, there's the
document type definition, which says which elements you can have, and
how they interrelate. Second, with the document type definition,
there's a description of the semantics of the elements, so you know
what they mean. The description is needed because SGML is not
concerned with what things mean, only how they are represented. (You
might complain that this is too small, but it's better to do a given
task well than to do a greater task badly. There are other standards
in the great SGML family which take care of these things, and more are
coming as we witness increased adoption of SGML in the market.)
<Q>I'm writing a book, and my publisher wants me to submit an SGML
document on a diskette, what do I do?
<A>You take a look at one of the several SGML editing system around,
and see which you think you would like to write a whole book with.
Recruit your publisher to help you understand what he wants, and try
to play with SGML a little before you start writing. SGML is like,
um, anyway, it gets better with experience, and can be frightening the
first time. For a good list of starter tools, I again refer you to
Robin Cover's brief bibliography for the details.
<SECTION>TECHNICAL QUESTIONS with answers
<Q>What, precisely, is an "element"?
<A>An element is the smallest part of a document that SGML deals with,
and it's the basic building block of document types. An element may
contain data (text), subelements, both, or it may be empty. The task
of a document type designer is to identify the elements a document is
to consist of, and define a hierarchical structure of these elements
by means of other elements. An element definition consists of the
name (generic identifier) which will be used in tags, a description of
the content (using a "content model"), and an indication of whether
the start-tags or the end-tags may be omitted. An element (in the
document instance) is indicated by a start-tag, the contents, and an
end-tag.
An element, with its notion of content models, provide a powerful
abstraction over the different kinds of text that can be found in a
document. For instance, ordinary text is just characters that will be
formatted somehow on output. If you have special kinds of text, such
as, for instance, a telephone number, it could make sense (depending
on your application) to make a special element with generic identifer
"phone". That way, you can look for telephone numbers and get matches
only at the right places. If you're really far-sighted, you would
define a telephone number notation you associated with this element,
so that you could check that all your phone numbers had the right
format. Then you could modify the presentation of a phone number to
suit a particular need, e.g. <phone>+1 516 555 8879</phone> in the
document could come out as "(516) 555 8879" in a domestic catalog and
with full, international format for an international catalog.
In a way, elements are like concepts, where a concept (say, "beef") is
an abstraction over an innumerable lot of things into a particular
"type" of thing, all having common characteristics, and fits into a
hierarchy where concepts may be abstractions over other concepts.
This idea of "types" and of a conceptual tool for text is one of the
many great things with SGML. A content model is like the definition
of a concept, with the important difference that a content model is
defined in terms of the behavior its subelements. A subelement may be
optional, required, or repeatable, and subelements may be chosen from
a set, form an ordered set, or form an unordered set. Then there are
exceptional subelements, which may either be forbidden or allowed
anywhere in the contents of the element.
The similarity between element and concepts go further, as elements
may have attributes. An attribute is information about an element
which is not part of its content.
The element in SGML is thus a high abstraction over identifiable,
separate portions of contents of a document from a conceptual and
hierarchical view.
<Q>What is an "entity" in SGML?
<A>The notion of an entity is SGML is an even higher abstraction than
the element, and since this is somewhat unexpected to most readers of
SGML, it's probably the reason why so many have problems with it.
The concept of an element comes from looking at the contents of a
document and grasping that the contents forms an element structure, a
hierarchy of elements, and that the nature of each element can be
abstracted so that a content model can be defined which spans the
varied use of each subelement.
The concept of an entity comes from looking at the individual pieces of text
that make up a whole document, and realizing that these pieces are
independent of the element structure. E.g., a book may physically consist
of several files on the author's disks. The element structure of the book
spans all the disks and all the files, yet it's important to be able to
refer to the files. The both complicating and relieving aspect of this is
that we need to be able to refer to these pieces in a system- and
storage-independent way. This is where the entity saves us a lot of
trouble. Entities are named pieces of text.
The abstraction that causes some confusion is over what a "piece of
text" is, and, in particular, where it is found. We have looked at
external entities, that is, entities which, when we refer to them,
cause us to read a different file. We may also need to define short-
hand notations for things in a document without needing an external
file for every small piece of text. This means that entities have
types, as well. There are internal entities, entities that are useful
as short-hands for language constructs, entities that are text which
is not to be interpreted, etc, and external entities, entities that
are simply text, entities that are in a special notation, to be
interpreted by a special program, perhaps with parameters, entities
which constitute larger parts of the administrative functions of the
first and second part of the SGML document.
Moreover, entities may be used both by the administrative parts and
the user, and the user shouldn't have to worry about which entities
are used by the administrative functions he doesn't see. So, entities
come in two flavors, parameter entities and general entities.
An "entity", then, is an abstraction over several types of text that
you want to refer to by name. Once defined, you don't need to know
where it is found, or of what kind it is -- all (general) entities
look and feel the same to the user.
14) Making comments/additions to this FAQ
=========================================
If you have any additional information, comments or questions, please
email either of the authors of this FAQ. (Producers and suppliers of
commercial software should note that we can only provide _very_ brief
details of their product). If you wish to include some information,
please indicate _clearly_ in your message where you think it should
go in the FAQ (ie. what section etc).
Email: Erik Naggum <enag@no.uio.ifi>
Michael Popham <M.G.Popham@exeter.ac.uk>
Post: Erik Naggum, Naggum Software, Boks 1570, Vika, 0118 OSLO, Norway;
Michael Popham, The SGML Project, Computer Unit, University of Exeter,
Exeter EX4 4QE, UK.
Fax: (Michael Popham) +44-392-211630
=========================================================================
COPYRIGHT NOTICE - Users are free to distribute this information in
any form, PROVIDED THAT no charge (other than to cover reproduction
costs) is made, the authors are acknowledged, and the final section on
"Making comments/additions to this FAQ" and this notice are included
in all copies.
=========================================================================